Applying a Dynamic Bayesian Network Framework to Transliteration Identification

نویسنده

Peter Nabende

چکیده

Identification of transliterations is aimed at enriching multilingual lexicons and improving performance in various Natural Language Processing (NLP) applications including Cross Language Information Retrieval (CLIR) and Machine Translation (MT). This paper describes work aimed at using the widely applied graphical models approach of ‘Dynamic Bayesian Networks (DBNs) to transliteration identification. The task of estimating transliteration similarity is not very different from specific identification tasks where DBNs have been successfully applied; it is also possible to adapt DBN models from the other identification domains to the transliteration identification domain. In particular, we investigate the applicability of a DBN framework initially proposed by Filali and Bilmes (2005) to learn edit distance estimation parameters for use in pronunciation classification. The DBN framework enables the specification of a variety of models representing different factors that can affect string similarity estimation. Three DBN models associated with two of the DBN classes originally specified by Filali and Bilmes (2005) have been tested on an experimental set up of Russian-English transliteration identification. Two of the DBN models result in high transliteration identification accuracy and combining the models leads to even much better transliteration identification accuracy.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Applying Dynamic Bayesian Networks in Transliteration Identification

متن کامل

Mining Transliterations from Wikipedia using Dynamic Bayesian Networks

Transliteration mining is aimed at building high quality multi-lingual named entity (NE) lexicons for improving performance in various Natural Language Processing (NLP) tasks including Machine Translation (MT) and Cross Language Information Retrieval (CLIR). In this paper, we apply two Dynamic Bayesian network (DBN)-based edit distance (ED) approaches in mining transliteration pairs from Wikipe...

متن کامل

Evaluation of Dynamic Bayesian Network models for Entity Name Transliteration

This paper proposes an evaluation of DBN models so as to identify DBN configurations that can improve machine transliteration accuracy.

متن کامل

A Bayesian model of bilingual segmentation for transliteration

In this paper we propose a novel Bayesian model for unsupervised bilingual character sequence segmentation of corpora for transliteration. The system is based on a Dirichlet process model trained using Bayesian inference through blocked Gibbs sampling implemented using an efficient forward filtering/backward sampling dynamic programming algorithm. The Bayesian approach is able to overcome the o...

متن کامل

Dynamic Bayesian Networks for Transliteration Discovery and Generation

This project is involved with extraction and transliteration of entity names between languages that use different writing systems (or alphabets). Extraction involves the automatic identification of sequences in parallel/comparable corpora/text that can be considered as proper entity names. On the other hand, transliteration generation involves automatic transformation of a source language name ...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Applying a Dynamic Bayesian Network Framework to Transliteration Identification

نویسنده

چکیده

منابع مشابه

Applying Dynamic Bayesian Networks in Transliteration Identification

Mining Transliterations from Wikipedia using Dynamic Bayesian Networks

Evaluation of Dynamic Bayesian Network models for Entity Name Transliteration

A Bayesian model of bilingual segmentation for transliteration

Dynamic Bayesian Networks for Transliteration Discovery and Generation

عنوان ژورنال:

اشتراک گذاری